Anduril (workflow Engine)
   HOME

TheInfoList



OR:

Anduril is an open source component-based workflow framework for scientific data analysis developed at the Systems Biology Laboratory,
University of Helsinki The University of Helsinki ( fi, Helsingin yliopisto, sv, Helsingfors universitet, abbreviated UH) is a public research university located in Helsinki, Finland since 1829, but founded in the city of Turku (in Swedish ''Åbo'') in 1640 as the ...
. Anduril is designed to enable systematic, flexible and efficient data analysis, particularly in the field of high-throughput experiments in biomedical research. The workflow system currently provides components for several types of analysis such as
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
,
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
, SNP,
ChIP-on-chip ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray (''"chip"''). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA ''in vivo' ...
,
comparative genomic hybridization Comparative genomic hybridization (CGH) is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells. The ...
and exon microarray analysis as well as
cytometry Cytometry is the measurement of number and characteristics of cells. Variables that can be measured by cytometric methods include cell size, cell count, cell morphology (shape and structure), cell cycle phase, DNA content, and the existence o ...
and
cell imaging Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...
analysis.


Architecture and features

A workflow is a series of processing steps connected together so that the output of one step is used as the input of another. Processing steps implement data analysis tasks such as data importing, statistical tests and report generation. In Anduril, processing steps are implemented using components, which are reusable executable code that can be written in any programming language. Components are wired together into a workflow, or a component network, that is executed by the Anduril workflow engine. Workflow configuration is done using a simple yet powerful scripting language, AndurilScript. Workflow configuration and execution can be done from
Eclipse An eclipse is an astronomical event that occurs when an astronomical object or spacecraft is temporarily obscured, by passing into the shadow of another body or by having another body pass between it and the viewer. This alignment of three ce ...
, a popular multipurpose GUI, or from the command line. The core Anduril engine is written in Java and components are written in a variety of programming languages, including Java, R,
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
,
Lua Lua or LUA may refer to: Science and technology * Lua (programming language) * Latvia University of Agriculture * Last universal ancestor, in evolution Ethnicity and language * Lua people, of Laos * Lawa people, of Thailand sometimes referred t ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
and
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
. Components may also have dependencies on third-party libraries, such as
Bioconductor Bioconductor is a Free software, free, Open-source software, open source and Open source software development, open development software project for the analysis and comprehension of Genome, genomic data generated by Wet laboratory, wet lab experi ...
. Components for cell imaging and microarray analysis are provided but additional components can be implemented by users. The Anduril core has been tested on Linux and Windows.


Anduril 1.0: AndurilScript language

Hello world in AndurilScript is simply std.echo("Hello world!") Commenting follows the syntax of Java: // A simple comment /* Another simple comment */ /** A description that will be included in component description */ Components are called by assigning their calls to named component instances. Names cannot be re-used within a single workflow. There are special components for input files that include external files to the script. Supported atomic types are integer, float, boolean and string, and typing is done implicitly. in1 = INPUT(path="myFile.csv") constant1 = 1 componentInstance1 = MyComponent(inputPort1 = in1, inputParam1 = constant1) Workflows are constructed by assigning outputs of component instances to inputs of following components. componentInstance2 = AnotherComponent(inputPort1 = componentInstance1.outputPort1) Component instances can also be wrapped as functions. function MyFunction(InType1 in1, ..., optional InTypeM inM, ParType1 param1, ..., ParTypeP paramP=defaultP) -> (OutType1 out1, ..., OutTypeN outN) In addition to standard if-else and switch-case statements, AndurilScript also includes for-loops. // Iterates over 1, 2, ..., 10 array = record() for i: std.range(1, 10)


Extensibility

Anduril can be extended on multiple levels. Users can add new components to existing component bundles. However, if the new component or components carry out tasks that are not related to existing bundles, users can also create new bundles.


Moksiskaan

Moksiskaan is a
data integration Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
framework for the
cancer research Cancer research is research into cancer to identify causes and develop strategies for prevention, diagnosis, treatment, and cure. Cancer research ranges from epidemiology, molecular bioscience to the performance of clinical trials to evaluate and ...
and
molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and physi ...
. The framework provides a relational database that represents a graph of biological entities such as genes, protein, drugs, pathways, diseases, biological processes, cellular components, and molecular functions. In addition, there is a wide set of analysis and accession tools built on top of this data. The great majority of these tools are implemented as Anduril components and functions. Moksiskaan is used mainly to interpret lists of
candidate gene The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies ...
s obtained from the genomic studies. Its tools can be used to generate graphs of biological entities related to the input genes. The exact for of these graphs may vary from the drug target predictions to the
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
of signalling cascades. Some of the goals of these tools are closely related to
IPA IPA commonly refers to: * India pale ale, a style of beer * International Phonetic Alphabet, a system of phonetic notation * Isopropyl alcohol, a chemical compound IPA may also refer to: Organizations International * Insolvency Practitioners ...
.


See also

*
Bioinformatics workflow management systems A bioinformatics workflow management system is a specialized form of workflow management system designed specifically to compose and execute a series of computational or data manipulation steps, or a workflow, that relate to bioinformatics. Ther ...
*
GenePattern GenePattern is a freely available computational biology open-source software package originally created and developed at the Broad Institute for the analysis of genomic data. Designed to enable researchers to develop, capture, and reproduce geno ...
*
Kepler Johannes Kepler (; ; 27 December 1571 – 15 November 1630) was a German astronomer, mathematician, astrologer, natural philosopher and writer on music. He is a key figure in the 17th-century Scientific Revolution, best known for his laws o ...
*
Apache Taverna Apache Taverna was an open source software tool for designing and executing workflows, initially created by the myGrid project under the name ''Taverna Workbench'', then a project under the Apache incubator. Taverna allowed users to integrate many ...
*
Workflow management system A workflow management system (WfMS or WFMS) provides an infrastructure for the set-up, performance and monitoring of a defined sequence of tasks, arranged as a workflow application. International standards There are several international standards- ...


References


Further reading


Scientists develop new database that provides comprehensive view of Glioblastoma Multiforme genome
in the Cancer Genome Atlas Research Briefs, March 2011, by Catherine Evans. * * * * * * * * * *{{cite journal , author=Louhimo R., Hautaniemi S. , title=CNAmet: an R package for integrating copy number, methylation and expression data, journal=Bioinformatics , volume=27, issue=6, pages=887–888, year=2011 , doi=10.1093/bioinformatics/btr019 , pmid=21228048, doi-access=free


External links


Official Anduril websiteAnduril Code repositoryOfficial Moksiskaan website
Workflow applications Bioinformatics software